The Likelihood Function

What we can say about our parameters using this function?

\[ \begin{align*} \mathcal{L}(\boldsymbol{\theta}|y) = P(y|\boldsymbol{\theta}) = f(y|\boldsymbol{\theta}) \end{align*} \]

. . .

The likelihood (\(\mathcal{L}\)) of the unknown parameters, given our data, can be calculated using our probability function.

. . .

CODE:

# A data point
  y=c(10)

#the likelihood the mean is 8, given our data
  dnorm(y,mean=8)
[1] 0.05399097


. . .

If we knew the mean is truly 8, it would also be the probability density of the observation y = 10.

Many Parameter Guesses

# Let's take many guesses of the mean
  means=seq(0,20,by=0.1)

# Use dnorm to get likelihood of each guess of the mean
# Assumes sd = 1
  likelihood=dnorm(y, mean=means)

. . .

Statistics and PDF Example

What is the mean height of King Penguins?

Statistics and PDF Example

We go and collect data,

\(\boldsymbol{y} = \begin{matrix} [4.34 & 3.53 & 3.75] \end{matrix}\)


. . .

Let’s decide to use the Normal Distribution as our PDF.

. . .

\[ \begin{align*} f(y_1 = 4.34|\mu,\sigma) &= \frac{1}{\sigma\sqrt(2\pi)}e^{-\frac{1}{2}(\frac{y_{1}-\mu}{\sigma})^2} \\ \end{align*} \]

. . .

AND

\[ \begin{align*} f(y_2 = 3.53|\mu,\sigma) &= \frac{1}{\sigma\sqrt(2\pi)}e^{-\frac{1}{2}(\frac{y_{2}-\mu}{\sigma})^2} \\ \end{align*} \] . . .

AND

\[ \begin{align*} f(y_3 = 3.75|\mu,\sigma) &= \frac{1}{\sigma\sqrt(2\pi)}e^{-\frac{1}{2}(\frac{y_{3}-\mu}{\sigma})^2} \\ \end{align*} \]

. . .

Or simply,

\[ \textbf{y} \stackrel{iid}{\sim} \text{Normal}(\mu, \sigma) \] . . .

\(iid\) = independent and identically distributed

. . .

Continued

The joint probability of our data with shared parameters \(\mu\) and \(\sigma\),

\[ \begin{align*} & P(Y_{1} = y_1,Y_{2} = y_2, Y_{3} = y_3 | \mu, \sigma) \\ &= \mathcal{L}(\mu, \sigma|\textbf{y}) \end{align*} \]

. . .

IF each \(y_{i}\) is independent, the joint probability of our data are simply the multiplication of all three probability densities,

\[ \begin{align*} =& f(y_{1}|\mu, \sigma)\times f(y_{2}|\mu, \sigma)\times f(y_{3}|\mu, \sigma) \end{align*} \]

We can do this because we are assuming knowing one value (\(y_1\)) does not tell us any new information about another value \(y_2\).

. . .

\[ \begin{align*} =& \prod_{i=1}^{3} f(y_{i}|\mu, \sigma) \\ =& \mathcal{L}(\mu, \sigma|y_{1},y_{2},y_{3}) \end{align*} \]

Code

Translate the math to code…

# penguin height data
  y=c(4.34, 3.53, 3.75)

#Joint likelihood of mu=3, sigma =1, given our data
  prod(dnorm(y,mean=3,sd=1))
[1] 0.01696987


. . .

Calcualte likelihood of many guesses of \(\mu\) and \(\sigma\) simultaneously,

# The Guesses
  mu=seq(0,6,0.05)
  sigma=seq(0.01,2,0.05)
  try=expand.grid(mu,sigma)
  colnames(try)=c("mu","sigma")

# function
fun=function(a,b){
  prod(dnorm(y,mean=a,sd=b))
  }

# mapply the function with the inputs
  likelihood=mapply(a=try$mu,b=try$sigma, FUN=fun)

# maximum likelihood of parameters
  try[which.max(likelihood),]
      mu sigma
925 3.85  0.36


. . .

Likelihood plot (3D)

Loading required package: ggplot2

Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout

Sample Size

What happens to the likelihood if we increase the sample size to N=100?

. . .